TEST: include icechunks zarr tree strategies here#3914
TEST: include icechunks zarr tree strategies here#3914ianhi wants to merge 4 commits intozarr-developers:mainfrom
icechunks zarr tree strategies here#3914Conversation
…trategy Introduces declarative tree descriptors in `zarr.testing.models` (ArrayNode, GroupNode) with `materialize()` / `from_store()` round-trip helpers, and a hypothesis `trees()` strategy in `zarr.testing.strategies` that generates realistic zarr hierarchies with prefix-colliding names (sibling `foo`/`foo-bar`, cousin name reuse). `trees()` accepts `case_insensitive: bool | None` to keep sibling names from differing only in letter case when targeting stores backed by a case-insensitive filesystem (macOS APFS, Windows NTFS default). Defaults to the current platform's filesystem behavior. `materialize()` accepts an optional `zarr_format` to exercise v2 and v3 from the same tree descriptor.
Two property tests exercising zarr hierarchy code paths against the new trees() hypothesis strategy: - test_groups_and_arrays_split_children — verifies that Group.groups() and Group.arrays() return exactly the sub-groups and arrays respectively, across random hierarchies with prefix-colliding sibling names. Uses a MemoryStore since the partition logic is store-agnostic; local/zip coverage is already carried by the sibling imperative test. - test_group_members_tree_roundtrip — materialize() / from_store() round-trip for any GroupNode tree. Each test pins a hand-picked shape via @example so the known-good case runs alongside the random draws.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3914 +/- ##
==========================================
- Coverage 93.10% 92.97% -0.14%
==========================================
Files 85 86 +1
Lines 11193 11324 +131
==========================================
+ Hits 10421 10528 +107
- Misses 772 796 +24
🚀 New features to boost your workflow:
|
| class GroupNode: | ||
| children: dict[str, ArrayNode | GroupNode] = field(default_factory=dict) | ||
|
|
||
| def walk(self, prefix: str = "") -> Iterator[tuple[str, Node]]: |
There was a problem hiding this comment.
i feel like "iter_nodes"or similar is a bit more expressive than "walk"
|
Thanks for this! I feel like if these array / group hierarchy methods are useful for testing, then they are just useful, and we should consider integrating them into the codebase. With that in mind I'm gonna leave some comments but they are just advisory, feel free to ignore. |
| if isinstance(child, GroupNode): | ||
| yield from child.walk(p) | ||
|
|
||
| def nodes(self, prefix: str = "", *, include_root: bool = False) -> list[str]: |
There was a problem hiding this comment.
all these iteration routines can do IO. so returning list[stuff] requires doing all the IO before the first value is available. Iterator[stuff] or Generator[stuff] gives us a bit more flexibility if we want to stream results instead of doing all the IO up-front.
There was a problem hiding this comment.
(this applies in a world where we put these routines on the group class that's bound to a store)
|
|
||
|
|
||
| @dataclass(frozen=True) | ||
| class GroupNode: |
There was a problem hiding this comment.
this approach to "structurally model a group" reminds me of GroupSpec in pydantic-zarr. this is clearly a useful representation, so we should figure out how we can fit it into zarr-python proper
|
|
||
| GroupNode.from_store(some_memory_store) | ||
| """ | ||
| return sync(cls.from_store_async(store)) |
There was a problem hiding this comment.
using sync here forces any consumer to use the zarr-python event loop. flagging this for visibility
| zarr_format=st.sampled_from([2, 3]), | ||
| ) | ||
| @pytest.mark.filterwarnings("ignore:Consolidated metadata:zarr.errors.ZarrUserWarning") | ||
| def test_groups_and_arrays_split_children( |
There was a problem hiding this comment.
was there bugged behavior that this test was necessary to fix?
Up streaming our zarr tree hypothesis strategy from icechunk.
generates trees like these:
Two commits
docs/user-guide/*.mdchanges/